A category based approach for recognition of out-of-vocabulary words
نویسندگان
چکیده
In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new approach for the integration of out-of-vocabulary words into statistical language models. We use category information for all words in the training corpus to define a function that gives an approximation of the out-of-vocabulary word emission probability for each word category. This information is integrated into the language models. Although we use a simple acoustic model for out-of-vocabulary words, we achieve a 6% reduction of word error rate on spontaneous speech data with about 5% out-of-vocabulary rate.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملتشخیص دستنوشتۀ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر
The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...
متن کاملA Category Based Approach for Recognitionof Out - of - Vocabulary
Das diesem Bericht zugrundeliegende Forschungsvorhaben wurde mit Mitteln des Bundesministers f ur Bildung, Wissenschaft, Forschung und Technologie unter dem F orderkennzeichen 01 IV 102 H/0 gef ordert. Die Verantwortung f ur den Inhalt dieser Arbeit liegt bei den Autoren. ABSTRACT In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer ...
متن کاملRecognition of Out-of-vocabulary Words and Their Semantic Category
In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a signiicant amount of out-of-vocabulary (OOV) words even when the vocabulary size is very large. In this paper we present a new approach for the integration of OOV words into statistical language models. It is based on t...
متن کاملRecognition of out-of-vocabulary words with sub-lexical language models
A major source of recognition errors, out-of-vocabulary (OOV) words are also semantically important; recognizing them is, therefore, crucial for understanding. Success, so far, has been modest, even on very constrained tasks. In this paper we present a new approach to unlimited vocabulary speech recognition based on using graphemeto-phoneme correspondences for sub-lexical modeling of OOV words,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996